Skip to content

Lifei/agent browser#8032

Open
lifeizhou-ap wants to merge 29 commits intomainfrom
lifei/agent-browser
Open

Lifei/agent browser#8032
lifeizhou-ap wants to merge 29 commits intomainfrom
lifei/agent-browser

Conversation

@lifeizhou-ap
Copy link
Collaborator

Summary

Testing

Related Issues

Relates to #ISSUE_ID
Discussion: LINK (if any)

Screenshots/Demos (for UX changes)

Before:

After:

* main: (22 commits)
  feat: add gemini-acp provider, update docs on subscription models + improvements to codex (#8000)
  fix(openai): use Responses API for gpt-5.4 (#7982)
  Remove lead/worker provider (#7989)
  chore(release): release version 1.28.0 (#7991)
  Fix empty tool results from resource content (e.g. auto visualiser) (#7866)
  Separate SSE streaming from POST work submission (#7834)
  fix: include token usage in Databricks streaming responses (#7959)
  Optimize tool summarization (#7938)
  fix: overwrite the deprecated googledrive extension config (#7974)
  refactor: remove unnecessary Arc<Mutex> from tool execution pipeline (#7979)
  Revert message flush & test (#7966)
  docs: add Remote Access section with Telegram Gateway documentation (#7955)
  fix: update webmcp blog post metadata image URL (#7967)
  fix: clean up OAuth token cache on provider deletion (#7908)
  fix: hard-coded tool call id in code mode callback (#7939)
  Fix SSE parsers to accept optional space after data: prefix (#7929)
  docs: add GOOSE_INPUT_LIMIT to config-files.md (#7961)
  Add WebMCP for Beginners blog post (#7957)
  Fix download manager (#7933)
  Improve the formatting of tool calls, show thinking, treat Reasoning and Thinking as the same thing (sorry Kant) (#7626)
  ...
* main: (37 commits)
  fix: handle reasoning content blocks in OpenAI-compat streaming parser (#8078)
  chore(acp): build native packages on latest mac (#8075)
  Display delegate sub agents logs in UI (#7519)
  Update tar version to avoid CVE-2026-33056 (#8073)
  refactor: consolidate duplicated dependencies into workspace (#8041)
  tui: set up for publishing via github actions (#8020)
  feat: feature-gate local inference dependencies (#7976)
  feat: ability to manage sub recipes in desktop ui (#6360)
  Tweak the release process: no more merge to main (#7994)
  fix: gemini models via databricks (#8042)
  feat(apps): Pass toolInfo to MCP Apps via hostContext (#7506)
  fix: remove configured marker when deleting oauth provider configuration (#7887)
  docs: add vmware-aiops MCP extension documentation (#8055)
  Show setup instructions for ACP providers in settings modal (#8065)
  deps: replace sigstore-verification with sigstore-verify to kill vulns (#8064)
  feat(acp): add session/set_config and stabilize list, delete and close (#7984)
  docs: Correct `gosoe` typo to `goose` (#8062)
  fix: use default provider and model when provider in session no longer exists (#8035)
  feat: add GOOSE_SHELL env var to configure preferred shell (#7909)
  fix(desktop): fullscreen header bar + always-visible close controls (#8033)
  ...
@lifeizhou-ap lifeizhou-ap marked this pull request as ready for review March 24, 2026 01:36
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b1fc2bf84b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


STEP=$((i + 1))
echo "[$STEP/$TOTAL] agent-browser ${GLOBAL_ARGS[*]} ${ARGS[*]} --timeout $CMD_TIMEOUT_MS"
if ! timeout "$CMD_TIMEOUT" pnpm exec agent-browser "${GLOBAL_ARGS[@]}" "${ARGS[@]}"; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Respect command-specific timeouts in replay

replay.sh hard-caps every agent-browser command to 10 seconds via timeout "$CMD_TIMEOUT" ..., so recorded steps that intentionally request longer waits (for example wait --timeout 60000/120000 in the new extension recordings) will still be killed at 10s. This makes long-running LLM/tool steps fail even when the recording specifies a larger timeout, causing deterministic false failures in the new E2E workflow.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// In e2e test mode, auto-confirm dialogs so CDP-based tests can proceed
// without needing to interact with native OS dialogs.
if (process.env.ENABLE_PLAYWRIGHT) {
return { response: 1 };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return the actual affirmative dialog response in E2E mode

In ENABLE_PLAYWRIGHT mode this handler always returns { response: 1 }, but several callers treat index 0 as the confirm action (e.g., KeyboardShortcutsSection conflict/reset dialogs). That means these dialogs are auto-cancelled instead of auto-confirmed during E2E runs, so test behavior diverges from user behavior and some flows become untestable.

Useful? React with 👍 / 👎.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3bef76bc45

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +23 to +25
PORT=$((9300 + RANDOM % 100))
if ! lsof -i :"$PORT" &>/dev/null; then
echo "$PORT"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reserve CDP ports atomically across parallel workers

pick_port does a check-then-use (lsof then echo) on a shared 9300–9399 range, but e2e-run-all.sh launches multiple tests concurrently (xargs -P "$WORKERS"). Two workers can select the same still-unbound port, which makes one session connect to the wrong Electron instance or fail nondeterministically once one process binds first. This creates cross-test contamination and flaky CI outcomes in the new parallel runner.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed this

Comment on lines +86 to +90
if ! CDP_PORT=$(wait_for_app "$TEST_NAME"); then
local DURATION=$(( SECONDS - START_TIME ))
echo "[$TEST_NAME] FAIL — app did not start within 30s (${DURATION}s)"
echo "FAIL ${DURATION}s" > "$STATUS_DIR/$TEST_NAME"
return

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Always stop the test session on startup timeout

When app startup does not become ready within 30s, run_one returns early before reaching the teardown call (e2e-stop.sh). Since the app was launched in a detached screen session, this path can leave orphaned Electron/goosed processes and stale session artifacts, which then interfere with subsequent recordings in the same run (e.g., port conflicts and residual state).

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

@lifeizhou-ap lifeizhou-ap Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2dcb343a45

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +21 to +23
if [[ -n "$CDP_PORT" ]]; then
lsof -ti :"$CDP_PORT" 2>/dev/null | xargs kill -9 2>/dev/null || true
pkill -9 -f "remote-debugging-port=$CDP_PORT" 2>/dev/null || true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop goosed cleanly instead of SIGKILLing Electron

This teardown path force-kills the CDP listener with kill -9, which bypasses Electron shutdown hooks that normally call goosedResult.cleanup() in main.ts. Because goosed is spawned as goosed agent (no session path in argv), the later pattern kill in e2e-stop.sh does not reliably target it, so tests can leave orphan backend processes running and contaminate subsequent runs.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kill -9 is intentional to avoid hanging during cleanup. Orphan goosed processes are swept by pkill -9 -f "$BASE_DIR" at the end of e2e-stop.sh. We haven't seen orphan contamination in practice. Will revisit if it becomes an issue.

trap 'echo ""; echo "Interrupted, stopping active E2E sessions..."; cleanup_and_exit 130' INT TERM

# Run recordings in parallel with worker limit
printf '%s\n' "${RECORDINGS[@]}" | xargs -P "$WORKERS" -I {} bash -c "run_one '{}' '$STATUS_DIR'" || true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate parallel workers when single-instance lock applies

The runner always executes recordings in parallel, but the app still enforces app.requestSingleInstanceLock() on non-macOS in main.ts. On Linux/Windows, only one Electron instance can remain alive, so additional workers will fail startup and be reported as test failures; this needs a platform guard (or forcing --workers 1) to avoid deterministic false negatives outside macOS.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only supports macos at the moment

Comment on lines +102 to +103
timeout "$TIMEOUT" bash "$SCRIPT_DIR/replay.sh" \
"$RECORDING" \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect scenario time budgets before test-level timeout

Each recording is wrapped in a fixed timeout "$TIMEOUT" (default 120s), but new recordings include waits up to 120s after earlier long waits (for example, extension-install-playwright.batch.json has 60s + 60s + 120s waits). This causes the harness to terminate otherwise valid long-running scenarios before their own declared timeouts elapse, producing avoidable TIMEOUT failures.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduced timeout in the test

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8281334588

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const goosedResult = await startGoosed({
serverSecret,
dir: dir || os.homedir(),
dir: process.env.GOOSE_WORKING_DIR || dir || os.homedir(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve requested chat directory over env default

createChat now always prefers process.env.GOOSE_WORKING_DIR over the per-window dir argument, so any flow that passes a specific directory (e.g., deep links/new-window creation with dir) is silently ignored whenever that env var is set. This regresses directory-scoped behavior in those launches: users/tests can request a different folder, but startGoosed still starts in the env folder. Restricting the env override to initial defaulting (or to explicit E2E mode) avoids breaking dir-driven window creation.

Useful? React with 👍 / 👎.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0430864b99

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


pkill -9 -f "$BASE_DIR" 2>/dev/null || true
pkill -9 -f 'agent-browser-chrome' 2>/dev/null || true
pkill -9 -f 'agent-browser-darwin\|agent-browser-linux' 2>/dev/null || true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use ERE alternation when killing agent-browser helpers

pkill patterns are interpreted as regular expressions (BSD/FreeBSD pgrep/pkill docs describe these as extended regex), so alternation should be |, not \|. With the current pattern, this command matches a literal pipe and does not match agent-browser-darwin or agent-browser-linux, leaving helper processes alive after teardown; those leftovers can interfere with later E2E runs on the same machine.

Useful? React with 👍 / 👎.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c0c9c54d77

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

rm -f "$BASE_DIR/.port-locks/$CDP_PORT"
fi

pkill -9 -f "$SESSION_DIR" 2>/dev/null || true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Kill goosed explicitly for per-session teardown

This teardown path does not reliably terminate the backend process after force-killing Electron. stop_session only pattern-kills by "$SESSION_DIR", but startGoosed launches goosed as goosed agent with the session path only in cwd (not argv), so that match can miss; and in the normal runner path we call e2e-stop.sh "$TEST_NAME", which exits before the global fallback sweeps at the bottom of the script. Fresh evidence: ui/desktop/src/goosed.ts uses spawnArgs = ['agent'] and cwd: workingDir, so argv-based pkill -f "$SESSION_DIR" is not a reliable cleanup mechanism. When this miss happens, orphan goosed processes can leak into later tests and cause cross-test contamination/flakes.

Useful? React with 👍 / 👎.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 278f91582c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

uses: Swatinem/rust-cache@42dc69e1aa15d09112580998cf2ef0119e2e91ae # v2

- name: Install GNU timeout (if missing)
run: command -v timeout || brew install coreutils

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make timeout available on macOS before running E2E

On macos-latest, this step only runs brew install coreutils when timeout is missing, but the replay harness later calls timeout directly for every command. Homebrew coreutils installs gtimeout unless the gnubin path is added, so in the common case where timeout is absent the E2E run can still fail with timeout: command not found before tests execute. Please either export the coreutils gnubin directory (or symlink) here, or update the scripts to fall back to gtimeout.

Useful? React with 👍 / 👎.

Comment on lines +32 to +36
gh run download {run_id} --repo block/goose --name e2e-test-results --dir /tmp/e2e-results 2>&1
# example: result_url: https://github.com/block/goose/actions/runs/23487840554/artifacts/6079302235
# gh run download 23487840554 --repo block/goose --name e2e-test-results --dir /tmp/e2e-results 2>&1
rm -rf ui/desktop/tests/e2e-tests/results-ci
unzip -o /tmp/e2e-results.zip -d ui/desktop/tests/e2e-tests/results-ci

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Drop nonexistent unzip step after gh artifact download

When results_url is provided, these instructions call gh run download ... --dir /tmp/e2e-results and then immediately unzip /tmp/e2e-results.zip. gh run download already extracts artifacts into the target directory, so this zip file is not produced; the extra unzip command will fail and block Step 1 for CI-artifact debugging runs.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant